View Post [edit]
Poster: | Jeff Kaplan | Date: | Jan 15, 2021 5:38pm |
Forum: | texts | Subject: | Re: Wrong language stated for this newspaper |
one is still processing
i'll look into creating a collection.
Reply [edit]
Poster: | aronsson | Date: | Jan 16, 2021 7:08am |
Forum: | texts | Subject: | Re: Wrong language stated for this newspaper |
The OCR (full text) looks rather good, but all Swedish ÅÄÖ have become AAO, so perhaps OCR was not done with language=Swedish? If OCR is rerun now, will it use any new algorithms for detection and separation of newspaper columns? All that is mentioned in the metadata is "Ocr - ABBYY FineReader 11.0 (Extended OCR)", but it doesn't mention which settings were used for the OCR.
Since these items are named "Dagens Industri - March 13, 2015", sorting by title puts March 2015 before November 2014, and sorting by date (added in 2019) doesn't help. However, each item has an "Identifier - dagensindustri-2014-10-09" and sorting by that would make perfect sense, but the collection Newspapers has no option to sort by identifier.
Reply [edit]
Poster: | Jeff Kaplan | Date: | Jan 16, 2021 9:36am |
Forum: | texts | Subject: | Re: Wrong language stated for this newspaper |
re-processing with language=Swedish. check the text in a day or so. This post was modified by Jeff Kaplan on 2021-01-16 17:36:46